MUC-5 evaluation metrics

نویسندگان

  • Nancy Chinchor
  • Beth Sundheim
چکیده

The MUC-5 Scoring System is evaluation software that aligns and scores the templates produced by th e information extraction systems under evaluation in comparison to an "answer key" created by humans . The Scoring System produces comprehensive summary reports showing the overall scores for the templates in the test set ; these may be supplemented by detailed score reports showing scores for each template individually. Figure 1 shows a sample summary score report in the joint ventures task domain for the error metrics ; Figure 2 shows a corresponding summary score report for the recall-precision metrics .

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

MUC-4 evaluation metrics

The MUC-4 evaluation metrics measure the performance of the message understanding systems . This paper describes the scoring algorithms used to arrive at the metrics as well as the improvements that were made to th e MUC-3 methods . MUC-4 evaluation metrics were stricter than those used in MUC-3. Given the differences in scoring between MUC-3 and MUC-4, the MUC-4 systems' scores represent a lar...

متن کامل

MUC-3 evaluation metrics

The MUC-3 evaluation metrics are measures of performance for the MUC3 template fill task. Obtaining summary measures of performance necessitates the los s of information about many details of performance . The utility of summary measures for comparison of performance over time and across systems should outweigh thi s loss of detail . The template fill task is complex because of the varying natu...

متن کامل

Survey Of The Message Understanding Conferences

In this paper, the Message Understanding Conferences are reviewed, and the natural language system evaluation that is underway in preparation for the next conference is described. The role of the conferences in the evaluation of information extraction systems is assessed in terms of the purposes of three broad classes of evaluation: progress, adequacy, and diagnostic. The conferences have measu...

متن کامل

Critical Reflections on Evaluation Practices in Coreference Resolution

In this paper we revisit the task of quantitative evaluation of coreference resolution systems. We review the most commonly used metrics (MUC, B, CEAF and BLANC) on the basis of their evaluation of coreference resolution in five texts from the OntoNotes corpus. We examine both the correlation between the metrics and the degree to which our human judgement of coreference resolution agrees with t...

متن کامل

Instance Sampling for Multilingual Coreference Resolution

In this paper we investigate the effect of downsampling negative training instances on a multilingual memory-based coreference resolution approach. We report results on the SemEval-2010 task 1 data sets for six different languages (Catalan, Dutch, English, German, Italian and Spanish) and for four evaluation metrics (MUC, B, CEAF, BLANC). Our experiments show that downsampling negative training...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1993